Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

W-POS language model and its selecting and matching algorithms

QIU Yunfei, LIU Shixing, WEI Haichao, SHAO Liangshan

Journal of Computer Applications 2015, 35 (8): 2210-2214. DOI: 10.11772/j.issn.1001-9081.2015.08.2210

Abstract （404）

PDF （877KB）（304）

Save

n-grams language model aims to use text feature combined of some words to train classifier. But it contains many redundancy words, and a lot of sparse data will be generated when n-grams matches or quantifies the test data, which badly influences the classification precision and limites its application. Therefore, an improved language model named W-POS (Word-Parts of Speech) was proposed based on n-grams language model. After words segmentation, parts of speeches were used to replace the words that rarely appeared and were redundant, then the W-POS language model was composed of words and parts of speeches. The selection rules, selecting algorithm and matching algorithm of W-POS language model were also put forward. The experimental results in Fudan University Chinese Corpus and 20Newsgroups show that the W-POS language model can not only inherit the advantages of n-grams including reducing amount of features and carrying parts of semantics, but also overcome the shortages of producing large sparse data and containing redundancy words. The experiments also verify the effectiveness and feasibility of the selecting and matching algorithms.

Reference | Related Articles | Metrics

Select

Feature transfer weighting algorithm based on distribution and term frequency-inverse class frequency

QIU Yunfei, LIU Shixing, LIN Mingming, SHAO Liangshan

Journal of Computer Applications 2015, 35 (6): 1643-1648. DOI: 10.11772/j.issn.1001-9081.2015.06.1643

Abstract （460）

PDF （908KB）（342）

Save

Traditional machine learning faces a problem: when the training data and test data no longer obey the same distribution, the classifier trained by training data can't classify test data accurately. To solve this problem, according to the transfer learning principle, the features were weighted according to the improved distribution similarity of source domain and target domain's intersection features. The semantic similarity and Term Frequency-Inverse Class Frequency (TF-ICF) were used to weight non-intersection features in source domain. Lots of labeled source domain data and a little labeled target domain were used to obtain the required features for building text classifier quickly. The experimental results on test dataset 20Newsgroups and non-text dataset UCI show that feature transfer weighting algorithm based on distribution and TF-ICF can transfer and weight features rapidly while guaranteeing precision.

Reference | Related Articles | Metrics